Modeling Speech Emotion Recognition via Attention-Oriented Parallel CNN Encoders

نویسندگان

چکیده

Meticulous learning of human emotions through speech is an indispensable function modern emotion recognition (SER) models. Consequently, deriving and interpreting various crucial features from raw data are complicated responsibilities in terms modeling to improve performance. Therefore, this study, we developed a novel SER model via attention-oriented parallel convolutional neural network (CNN) encoders that parallelly acquire important used for classification. Particularly, MFCC, paralinguistic, spectrogram were derived encoded by designing different CNN architectures individually the features, fed attention mechanisms further representation, then classified. Empirical veracity executed on EMO-DB IEMOCAP open datasets, results showed proposed more efficient than baseline Especially, weighted accuracy (WA) unweighted (UA) equal 71.8% 70.9% dataset scenario, respectively. Moreover, WA UA rates 72.4% 71.1% with dataset.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adversarial Auto-Encoders for Speech Based Emotion Recognition

Recently, generative adversarial networks and adversarial autoencoders have gained a lot of attention in machine learning community due to their exceptional performance in tasks such as digit classification and face recognition. They map the autoencoder’s bottleneck layer output (termed as code vectors) to different noise Probability Distribution Functions (PDFs), that can be further regularize...

متن کامل

Emotion Recognition via Continuous Mandarin Speech

Emotion plays a significant role in cognitive psychology, behavioural sciences and humanoid robot design. The continuing improvements in speech recognition technology have led to many new and fascinating applications in human-computer interaction, context aware computing and computer mediated communication. A growing number of research studies in emotion recognition via an isolated short senten...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Emotion Recognition from Speech

This paper proposes the classification of emotions based on spectral features using the Gaussian Mixture Model as the classifier. The performance of the Gaussian Mixture Model has been evaluated for two types of databases – acted and reallife speech corpuses. The model has also been evaluated for the variation in its performance based on the speaker, gender of the speaker and the number of the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2022

ISSN: ['2079-9292']

DOI: https://doi.org/10.3390/electronics11234047